[SharedOffloadRegion] Align blocks to page-size #43689
Conversation
|
Thanks @varun-sundar-rabindranath ! |
Thanks for taking a look @orozery. The fixes are the root cause of the test failure. The current set of tests dont invoke the SharedOffloadRegion. These are targeted tests that test the fs_tier directly. IMO Making sure that the CPU backing tensor is always aligned should be a separate test. what do you think ? |
Hi @orozery I have updated the tests to use SharedOffloadRegion directly for better tests and have updated SharedOffloadRegion to always align its rows to page size boundaries. PTAL! Thanks. |
orozery
left a comment
There was a problem hiding this comment.
Thanks @varun-sundar-rabindranath !
Can you please change the PR title and description to reflect it aligns CPU pages?
|
|
||
| self.page_size = mmap.PAGESIZE | ||
| self.num_blocks = num_blocks | ||
| self.total_size_bytes, self._row_stride = self._maybe_update_buffer_size( |
There was a problem hiding this comment.
This potentially violates the user's cpu_bytes_to_use, allocating more than the user allowed.
I think we want to add an alignment classvar in cpu/spec.py, which will be overrided in tiering/spec.py:
# CPUOffloadingSpec
class CPUOffloadingSpec(OffloadingSpec):
CPU_PAGE_SIZE_ALIGNMENT = 1
def __init__(self, vllm_config, kv_cache_config):
...
kv_bytes_per_offloaded_block = kv_bytes_per_block * self.block_size_factor
self.cpu_page_size_per_worker = round_up(
kv_bytes_per_offloaded_block // world_size,
self.CPU_PAGE_SIZE_ALIGNMENT,
)
self.num_blocks = (
int(cpu_bytes_to_use) // (self.cpu_page_size_per_worker * world_size)
if self.cpu_page_size_per_worker > 0
else 0
)
...
# TieringOffloadingSpec
class TieringOffloadingSpec(CPUOffloadingSpec):
CPU_PAGE_SIZE_ALIGNMENT = SharedOffloadRegion.PAGE_SIZE_ALIGNMENT
...
There was a problem hiding this comment.
Nice catch. I agree that cpu_bytes_to_use should be respected.
Concern:
Expanding individual cpu_page_size_per_worker looks like it'll break a invariants and introduce some hard-to-catch bugs. i.e. We are going from,
B0 |<--- B0 W0 ---><--- B0 W1 ---><--- B0 W2 --->|
B1 |<--- B1 W0 ---><--- B1 W1 ---><--- B1 W2 --->|
B2 |<--- B2 W0 ---><--- B2 W1 ---><--- B2 W2 --->|
...
where Bi - Block i ; Wj - Worker j
to
B0 |<--- B0 W0 ---***pad***><--- B0 W1 ---***pad***><--- B0 W2 ---***pad***>|
B1 |<--- B1 W0 ---***pad***><--- B1 W1 ---***pad***><--- B1 W2 ---***pad***>|
B2 |<--- B2 W0 ---***pad***><--- B2 W1 ---***pad****><--- B2 W2 ---***pad***>|
...
where Bi - Block i ; Wj - Worker j
One example is the assert in cpu <-> gpu transfer
assert cpu_page_size == gpu_page_size * block_size_factor
vllm/vllm/v1/kv_offload/cpu/gpu_worker.py
Line 154 in 52a31cc
This padding will have to be plumbed through and handled correctly.
Instead I propose doing,
B0 |<--- B0 W0 ---><--- B0 W1 ---><--- B0 W2 --->***pad***|
B1 |<--- B1 W0 ---><--- B1 W1 ---><--- B1 W2 --->***pad***|
B2 |<--- B2 W0 ---><--- B2 W1 ---><--- B2 W2 --->***pad***|
...
which is a looser constraint and can be handled directly in SharedOffloadRegion. Respecting cpu_bytes_to_use can be handled by,
- Allocating less num_blocks in
CPUOffloadingSpecwhen padding is involved (communicated via CPU_PAGE_SIZE_ALIGNMENT classvar or a constructor arg) - passing
cpu_bytes_to_usedirectly to SharedOffloadRegion. - And introducing padding in SharedOffloadRegion that respects alignment and
cpu_bytes_to_use
| region, tensor, mock_view = _make_region_tensor_and_view( | ||
| num_blocks=4, | ||
| block_elements=_BLOCK_ELEMENTS, | ||
| instance_prefix="test-fs-tier", | ||
| ) |
There was a problem hiding this comment.
Assuming we move the alignment code further up to spec.py, let's go back to a simple tensor allocation here, but using an aligned page size.
There was a problem hiding this comment.
Keeping this as the new set of changes still updates SharedOffloadRegion directly. PTAL. Thanks 🙌
| assert self.cpu_bytes_to_use >= aligned_kv_bytes_per_offloaded_block, ( | ||
| f"CPU space insufficient for offloading. {self.cpu_bytes_to_use=} " | ||
| f"{kv_bytes_per_offloaded_block=} " | ||
| f"{aligned_kv_bytes_per_offloaded_block=} " | ||
| f"{self.block_size_alignment=}" | ||
| ) |
There was a problem hiding this comment.
Why do we need this assert?
There was a problem hiding this comment.
This assert is making sure that the num_blocks is not 0. It indicates that the user should increase cpu_bytes_to_use to run with CPU offloading enabled.
| self, | ||
| vllm_config: VllmConfig, | ||
| kv_cache_config: KVCacheConfig, | ||
| block_size_alignment: int = 1, |
There was a problem hiding this comment.
Let's use a classvar instead of introducing a new init param
There was a problem hiding this comment.
Updated to introduce a classvar.
I am a bit uncomfortable with the semantic difference between CPUOffloadingSpec.BLOCK_SIZE_ALIGNMENT vs self.BLOCK_SIZE_ALIGNMENT seems like it increases the surface for bugs. what do you think ?
|
This pull request has merge conflicts that must be resolved before it can be |
38b0fd3 to
9c8baee
Compare
| assert int(cpu_bytes_to_use) >= self.kv_bytes_per_offloaded_block_pad, ( | ||
| f"CPU space insufficient for offloading. {cpu_bytes_to_use=} " | ||
| f"{self.kv_bytes_per_offloaded_block=} " | ||
| f"{self.kv_bytes_per_offloaded_block_pad=} " | ||
| f"{self.BLOCK_SIZE_ALIGNMENT=}" | ||
| ) |
There was a problem hiding this comment.
Same as the previous review: Why do we need this assert?
There was a problem hiding this comment.
I replied in the previous review - #43689 (comment)
There was a problem hiding this comment.
IMO This is a bit far fetched.
And if have a single block, what is it good for?
There was a problem hiding this comment.
I have reverted the assert.
but, I believe it is better to be defensive in this case - for example I think having zero num_blocks will fail mmap in SharedOffloadRegion because num_blocks is zero. What do you think ? maybe we should handle it elsewhere.
|
@varun-sundar-rabindranath can you please address the remaining nits? |
Hi @orozery I have addressed the comments. PTAL! thanks 🙌 |
orozery
left a comment
There was a problem hiding this comment.
Thanks @varun-sundar-rabindranath !
7773cf1 to
8414e03
Compare
8414e03 to
562e361
Compare
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
562e361 to
0b2a808
Compare
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com> Signed-off-by: JisoLya <523420504@qq.com>
Purpose
Align blocks in SharedOffloadRegion to page_size so that O_DIRECT succeeds.
Changes:
Interface change:
Test Plan
Run
pytest -s tests/v1/kv_offload/test_fs_tier.pymultiple times locally.Test Result
Test passes.